Active learning and logarithmic opinion pools for HPSG parse selection

نویسندگان

  • Jason Baldridge
  • Miles Osborne
چکیده

For complex tasks such as parse selection, the creation of labelled training sets can be extremely costly. Resource-efficient schemes for creating informative labelled material must therefore be considered. We investigate the relationship between two broad strategies for reducing the amount of manual labelling necessary to train accurate parse selection models: ensemble models and active learning. We show that popular active learning methods for reducing annotation costs can be outperformed by instead using a model class which uses the available labelled data more efficiently. For this, we use a simple type of ensemble model called the Logarithmic Opinion Pool (LOP). We furthermore show that LOPs themselves can benefit from active learning. As predicted by a theoretical explanation of the predictive power of LOPs, a detailed analysis of active learning using LOPs shows that component model diversity is a strong predictor of successful LOP performance. Other contributions include a novel active learning method, a justification of our simulation studies using timing information, and cross-domain verification of our main ideas using text classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active learning for HPSG parse selection

We describe new features and algorithms for HPSG parse selection models and address the task of creating annotated material to train them. We evaluate the ability of several sample selection methods to reduce the number of annotated sentences necessary to achieve a given level of performance. Our best method achieves a 60% reduction in the amount of training material without any loss in accuracy.

متن کامل

Unsupervised Parse Selection for HPSG

Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant investment of effort to produce the treebank, parse selection is not possible. Furthermore, as t...

متن کامل

Active Learning and the Total Cost of Annotation

Active learning (AL) promises to reduce the cost of annotating labeled datasets for trainable human language technologies. Contrary to expectations, when creating labeled training material for HPSG parse selection and later reusing it with other models, gains from AL may be negligible or even negative. This has serious implications for using AL, showing that additional cost-saving strategies ma...

متن کامل

Parse Selection with a German HPSG Grammar

We report on some recent parse selection experiments carried out with GG, a large-scale HPSG grammar for German. Using a manually disambiguated treebank derived from the Verbmobil corpus, we achieve over 81% exact match accuracy compared to a 21.4% random baseline, corresponding to an error reduction rate of 3.8.

متن کامل

Ensemble-based Active Learning for Parse Selection

Supervised estimation methods are widely seen as being superior to semi and fully unsupervised methods. However, supervised methods crucially rely upon training sets that need to be manually annotated. This can be very expensive, especially when skilled annotators are required. Active learning (AL) promises to help reduce this annotation cost. Within the complex domain of HPSG parse selection, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Natural Language Engineering

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2008